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ABSTRACT 


PageRank is defined as the stationary state of a Markov chain. The 
chain is obtained by perturbing the transition matrix induced by 
a web graph with a damping factor «œ that spreads uniformly part 
of the rank. The choice of œ is eminently empirical, and in most 
cases the original suggestion œ = 0.85 by Brin and Page is still 
used. Recently, however, the behaviour of PageRank with respect 
to changes in œ was discovered to be useful in link-spam detec- 
tion [21]. Moreover, an analytical justification of the value chosen 
for «æ is still missing. In this paper, we give the first mathemati- 
cal analysis of PageRank when a@ changes. In particular, we show 
that, contrarily to popular belief, for real-world graphs values of a 
close to 1 do not give a more meaningful ranking. Then, we give 
closed-form formulae for PageRank derivatives of any order, and an 
extension of the Power Method that approximates them with con- 
vergence O (ta! ) for the k-th derivative. Finally, we show a tight 
connection between iterated computation and analytical behaviour 
by proving that the k-th iteration of the Power Method gives ex- 
actly the PageRank value obtained using a Maclaurin polynomial 
of degree k. The latter result paves the way towards the application 
of analytical methods to the study of PageRank. 


Categories and Subject Descriptors 


G.2 [Discrete Mathematics]: Graph Theory; G.3 [Probability 
and Statistics]: Markov processes 


General Terms 


Algorithms, Experimentation, Measurement 


Keywords 
Web graph, PageRank, Approximation 


1. INTRODUCTION 


PageRank [17] is one of the most important ranking techniques 
used in today’s search engines. Not only is PageRank a simple, ro- 
bust and reliable way to measure the importance of web pages [3], 
but it is also computationally advantageous with respect to other 
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ranking techniques in that it is query independent, and content inde- 
pendent. Otherwise said, it can be computed offline using only the 
web graph! structure and then used later, as users submit queries to 
the search engine, typically aggregated with other, query-dependent 
rankings [4, 12, 16]. 

One suggestive way to describe the idea behind PageRank is as 
follows: consider a random surfer that starts from a random page, 
and at every time chooses the next page by clicking on one of the 
links in the current page (selected uniformly at random among the 
links present in the page). As a first approximation, we could define 
the rank of a page as the fraction of time that the surfer spent on that 
page on the average. Clearly, important pages (i.e., pages that hap- 
pen to be linked by many other pages, or by few important ones) 
will be visited more often, which justifies the definition. However, 
we also allow the surfer to restart with probability 1 — œ from an- 
other node chosen randomly and uniformly, instead of following a 
link. 

As remarked in [5], a significant part of the current knowledge 
about PageRank is scattered through the research laboratories of 
large search engines, and its analysis “has remained largely in the 
realm of trade secrets and economic competition”. As the authors 
of the aforementioned paper, however, we believe that a scientific 
and detailed study of PageRank is essential to our understanding 
of the web, and we hope this paper can be a contribution in such 
program. 

PageRank is defined formally as the stationary distribution of a 
stochastic process whose states are the nodes of the web graph. The 
process itself is obtained by combining the normalised adjacency 
matrix of the web graph (with some patches for nodes without out- 
links that will be discussed later) with a trivial uniform process that 
is needed to make the combination irreducible and aperiodic, so 
that the stationary distribution is well defined. The combination 
depends on a damping factor a € [0, 1), which will play a major 
rôle in this paper. When a@ is 0, the web-graph part of the process 
is annihilated, resulting in the trivial uniform process. As œ goes to 
1, the web part becomes more and more important. 

The problem of choosing a was curiously overlooked in the first 
papers about PageRank: yet, not only PageRank changes signifi- 
cantly when @ is modified [19, 18], but also the relative ordering 
of nodes determined by PageRank can be radically different [14]. 
The original value suggested by Brin and Page (a = 0.85) is the 
most common choice. Intuitively, 1 — a is an amount of ranking 
that we agree to give uniformly at each page. This amount will be 
then funneled through the outlinks of the node. A common form of 
link spamming funnels carefully this amount towards a single page, 
giving it a preposterously great importance. 


The web graph is the directed graph whose nodes are URLs and 
whose arcs correspond to hyperlinks. 


It is natural to wonder what is the best value of the damping 
factor, if such a thing exists. In a way, when «œ gets close to 1 the 
Markov process is closer to the “ideal” one, which would somehow 
suggest that a should be chosen as close to 1 as possible. This 
observation is not new, but it has some naivety in it. 

The first issue is of computational nature: PageRank is tradition- 
ally computed using variants of the Power Method. The number of 
iterations required for this method to converge grows with a, and 
moreover more and more numerical precision is required as œ gets 
closer to 1. 

But there is an even more fundamental reason not to choose a 
value of «œ too close to 1: we shall prove in Section 3 that when a 
goes to | PageRank gets concentrated in the recurrent states, which 
correspond essentially to the nodes whose strongly connected com- 
ponents have no passage toward other components. This phenom- 
enon gives a null PageRank to all the pages in the core component, 
something that is difficult to explain and that is contrary to com- 
mon sense. In other words, in real-word web graphs the rank of 
all important nodes (in particular, all nodes of the core component) 
goes to 0 as @ goes to 1. 

Thus, PageRank oscillates between a meaningless uniform dis- 
tribution (œ = 0) and a meaningless distribution concentrated most- 
ly in irrelevant nodes (w = 1). As a result, both for choosing the 
correct damping factor and for detecting link spamming, being able 
to describe the behaviour of PageRank when a changes is essential. 
Recently, indeed, a sophisticated form of link-spam detection has 
been based on the study of the value of PageRank with respect to 
æ [21]. 

To proceed further in this direction, it is essential that we have 
at our disposal analytical tools that describe this behaviour. To this 
purpose, we shall provide closed-form formulae for the derivatives 
of any order of PageRank with respect to œ, and an iterative algo- 
rithm (an extension of the power method) that approximates them. 

The most surprising consequence, easily derived from our for- 
mulae, is that the vectors computed during the PageRank computa- 
tion for any a € (0, 1) can be used to approximate PageRank for 
every other a € (0, 1). This happens because the k-th coefficient of 
the Maclaurin series for PageRank can be easily computed during 
the k-th iteration of the Power Method. This allows to study easily 
the Behaviour of PageRank for any node storing a minimal amount 
of data. 


2. BASIC DEFINITIONS 


Let G be the adjacency matrix of a directed graph of N nodes 
(identified hereafter with the numbers from 0 to N — 1). A node is 
terminal if it does not have outlinks, except possibly for loops (or, 
equivalently, if all arcs incident on the node are incoming). If we 
want to be specific about the presence of a loop, we shall use the 
terms looped and loopless?. 

We note that usually G is preprocessed before building the cor- 
responding Markov chain. Common processing includes removal 
of all loops (as nodes should not give authoritativeness to them- 
selves) and thresholding the number of links coming from pages of 
the same domain (to reduce the effect of link spamming). 


Free Java code implementing all the algorithms de- 
scribed in this paper will be available for download at 
http://law.dsi.unimi.it/. 

3In PageRank-related literature, loopless terminal nodes are more 
commonly known as dangling nodes; the same kind of node is often 
called a sink in graph-theoretic literature. Our choice avoids the 
usage of ambiguous terms that have been given different meanings 
in different papers. 
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If no loopless terminal nodes are present (note that after the pre- 
processing sketched above they will be the only kind of terminal 
nodes), we can just normalise uniformly to 1 the row-sums of G by 
multiplying it by D7!, the inverse of the diagonal degree matrix. 
However, D is not invertible if loopless terminal nodes are present. 
The classical way to handle this situation consists in substituting 
them with nodes that have one outgoing arc toward every node (in- 
cluding the node itself. In other words, in G rows of zeroes are 
substituted with rows of ones. 

Let G be the (adjacency matrix of the) resulting graph, and D be 
the diagonal matrix of the outdegrees of G (i.e., dii is the number of 
ones on the i-th row of G). Let also 1 be the vector“ of all 1’s, and 
v be any personalisation vector (a vector whose elements are all 
non-negative and sum to 1, which is used to bias PageRank w.r.t. a 
selected set of trusted pages). 

We are providing a toy example in the Appendix that will guide 
the reader through the paper. In Table 5, the example graph G and 
its modified version G are presented. 


In the rest of the paper, we shall use the matrices defined in Fig- 
ure 1; some of them are functions of the damping factor a € [0, 1), 
and we will use a notation reflecting this fact. Note that O(a) is 
well defined for all œ € [0, 1), as (J — aP) is known to be invert- 
ible [20]. 


P:=D'G 
Ala) :=aP+(1—a)l?v 
C(a):=I-—aP 
Q(a) := PC(@)! 


Figure 1: Basic PageRank definitions. 


The PageRank vector r (œ) is defined as the dominant eigenvec- 
tor of A(a); more precisely, as the only vector summing to 1 such 
that r(œ«)A (œ) = r (œ). Noting that r(a)it = 1, we get 


r(a)(aP + (1 —a@)1? v) = ra) 
ar(a)P+(1—-—a)v=r(a) 
(l—a)v=r(a)U —aP), 


which yields the following closed formula for PageRank: 


r(a) =(1—a)vC(a)!. (1) 


This is Lemma 3 of [8], albeit in the original statement of this 
lemma the factor 1 — «œ is missing, probably due to an oversight. 
Note that (1) can be written as 


OO 
r(a)=(1-a)v $ (@P)', 

t=0 
which makes the dependence of PageRank on incoming paths very 
explicit. 

The reader can see the PageRank vector in Figure 7 (the prefer- 

ence vector v is the uniform vector). PageRank is represented as a 
function of @ in Figure 8. 


3. GENERAL BEHAVIOUR 


In this section, we shall discuss the general behaviour of Page- 
Rank as a function of the damping factor a, considering in particu- 
lar what happens when a gets close to 1. 


4 All vectors in this paper are row vectors. 


Recall that P (the row-normalised adjacency matrix) is a Markov 
chain, but in general it is neither aperiodic nor irreducible. Usually, 
though, in all practical cases P will be aperiodic, but reducible. In 
this paper, we shall assume that P is indeed aperiodic. 

Introducing the damping factor has the consequence of obtaining 
an aperiodic irreducible chain. Indeed, for all a € [0,1], A(@) 
is a Markov chain; moreover, if a < 1, A(q@) is irreducible and 
aperiodic. Hence, A(a) admits a unique limit distribution r (œ). 


3.1 Choosing the damping factor 


Clearly, r (œ) is a rational (vector) function of a: usually, though, 
one looks at r (œ) only for a specific value of a. All algorithms to 
compute PageRank actually compute (or, more precisely, provide 
an estimate of) r (œ) for some « that you plug in it, and it is by now 
an established use to choose a = 0.85. This choice was indeed 
proposed by Brin and Page [17], and it is rumored that Google 
itself uses this value; it seems that the rankings obtained with this 
choice are very natural and satisfactory for the users. 

Many authors had tried to devise a more thorough a posteriori 
justification for 0.85. It is easy to get convinced that choosing 
a small value for «œ is not appropriate, because too much weight 
would be given to the “uniform” part of A(@): indeed, as we re- 
marked in the introduction, A(O) is the uniform matrix and r (0) is 
the uniform distribution. 

Conversely, as a — 17, the matrix A(q@) tends to P: this fact 
seems to suggest that choosing a close to 1 should give a “truer” 
or “better” PageRank: this is a widely diffused opinion (as we 
shall see, most probably a misconception). In any case, as we re- 
marked in the introduction there are some computational obstacles 
to choosing a value of a too close to 1. The Power Method con- 
verges more and more slowly [9] as a —> 17, a fact that also in- 
fluences the other methods used to compute PageRank (which are, 
after all, variants of the Power Method [17, 7, 6, 15, 11, 10]). In- 
deed, the number of iterations required could in general be bounded 
using the separation between the first and the second eigenvalue, 
but unfortunately the separation can be abysmally small if a = 1, 
making this technique not applicable. Moreover, if œ is large the 
computation of PageRank may become numerically ill-conditioned 
(essentially for the same reason [8]). 


3.2 Getting close to 1 


Even disregarding the problems discussed above, we shall pro- 
vide convincing reasons that make it inadvisable to use a value of 
a close to 1. 


First observe that, since r (œ) is a rational (coordinatewise) bounded 


function defined on [0, 1), the limit 


* 


r“ = lim r(q@) 


a 1- 
exists (the reader can see the vector r* for our example in the cap- 
tion of Figure 7). 
It is easy to see that r* is actually one of the limit distributions 
of P (because limy_, ;- A(a) = P). There are some natural ques- 
tions about r* that we want to address: 


e can we somehow characterise the properties of r*? 


e what makes r* different from the other (infinitely many, if P 
is reducible) limit distributions of P? 


The first question is the most interesting, because it is about what 
happens to PageRank when a — 17; in a sense, fortunately, it is 
also the easiest to answer. 

Before doing this, recall some basic definitions and facts about 
Markov chains. 
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e Given two states x and y, we say that x leads to y iff there is 
some m > 0 such that there is a non-zero probability to go 
from x to y in m steps. 


e A state x is transient iff there is a state y such that x leads 
to y but y does not lead to x. A state is recurrent iff it is not 
transient. 


e Inevery limit distribution p of an aperiodic Markov chain, if 
Px > 0 then x is recurrent [20]. 


Let us now introduce some graph-theoretical notation. Let G be 
a graph. 


e Given a node x of G, we write [x]g for the (strongly con- 
nected) component of G containing x. 


e The component graph of G is a graph whose nodes are the 
components of G, with an arc from [x]g to [y]g iff there are 
nodes x’ € [x]g and y’ € [y]g such that there is an arc from 
x’ to y” in G. The component graph is acyclic, apart for the 
possible presence of loops. 


e If x, y are two nodes of G, we write x ~»G_y iff there is a 
nonempty directed path from x to y in G (by nonempty we 
mean that the path should contain at least one arc). 


Clearly, a node is recurrent in P iff [x] fal is terminal; otherwise 
said, x is recurrent (in the Markov chain P) iff x ~+@ y implies 
y ~@ x as well. Note that nodes with just a loop are recurrent 
(and their component is looped, too). 

We now turn to our characterisation theorem, which identifies 
recurrent states on the basis of G, rather than G. The essence of the 
theorem is that, for what concerns recurrent states, the difference 
between G and G is not significant, unless there are no looped ter- 
minal nodes among the components of G. The latter case, however, 
is as pathological as periodicity in a large web graph. 


THEOREM 1. Let G and P be defined as above. Then: 


1. if G has a component that is looped and terminal (in the 
component graph), then a node is recurrent for P iff its com- 
ponent is looped and terminal; hence, given any limit distri- 
bution p for P, px > O implies that x is a node of G whose 
component is looped and terminal; 


2. if G does not contain a component that is looped and termi- 
nal, then every node is recurrent. 


PROOF. Note that x ~+@ y means that there is a nonempty path 


from x to y in G. Such a path can be decomposed into a sequence 
of (possibly empty) paths in G, from x = xg to a loopless terminal 
node yo, from a node x; to a loopless terminal node y1, ..., from 
a node xz to yy = y. Moreover, either k > 0, or the only path (a 
path from x to y in G) contains at least one arc. 

For case (1), let x be contained in a looped terminal component, 
and suppose that x ~+@ y. By the observation above, this path in G 
can be decomposed into a sequence of paths of G towards loopless 
terminal nodes, plus a final path to y: but from x you cannot reach 
a loopless terminal node of G (because x is contained in a looped 
terminal component), so the path is simply a nonempty path of G, 
i.e., x ~G y. But then y is in the same component as x, so y >G x 
as well, and we obtain the result. For the converse, suppose that x 
is not in a looped terminal component: we will show that there is a 
y such that x ~> & y but not y ~+@ x. We distinguish two cases: 


e suppose that there is a looped terminal component that can 
be reached from [x]g in the component graph of G; let y be 
any node in such component. Clearly x ~»G y, and hence 
x ~G y, but y ~g x does not hold (from y you can only 


reach nodes of [y]g both in G and in G); 


e otherwise, suppose that there is a loopless terminal y such 
that x ~>g y (or x = y if x itself is terminal); let z be any 
node in a looped terminal component G: now x ~+@ z (you 
first go from x to y and then you “jump” to z), but from z you 
cannot reach x (because x is not in the same component). 


For case (2), take any two nodes x and y of G. In the component 
graph of G there will be two terminal components [x’]g and [y’]g 
that are reachable from [x]g and [y]g, respectively. Both are, by 
hypothesis, loopless. In other words, there are two terminal nodes 
x’ and y’ such that x ~+g x’ (or x = x’) and y +g y’ (ory = y^. 
This means that x ~~ @ y and vice versa, unless x = y (and both are 
terminal), in which case again both x ~@ y and vice versa. 


The statement of the previous theorem may seem a bit unfath- 
omable. The essence, however, could be stated as follows: except 
for strongly connected graphs, or graphs whose terminal compo- 
nents are all trivial and loopless, the recurrent nodes are exactly 
those whose component is looped and terminal. These nodes are of- 
ten called rank sinks, as they absorb all the rank circulating through 
the graph. 

As we remarked, a real-world graph will certainly contain at least 
one looped terminal component, so the first statement of the theo- 
rem will hold. This means that most nodes x will be such that 
rž = 0. In particular, this will be true of all the nodes in the 
core component [13]: this result is somehow surprising, because 
it means that many important Web pages (that are contained in the 
core component) will have rank 0 in the limit (see, for instance, 
node 0 in our example). 

This is a rather convincing justification that, contradicting the 
common beliefs, choosing «œ too close to 1 does not provide any 
good PageRank. Rather, PageRank becomes “sensible” somewhere 
in between 0 and 1. 


As far as the second question is concerned, we provide a 


CONJECTURE 1. r* is the limit distribution of P when the start- 
ing distribution is uniform, that is, 


: i es 
lim r(a)= lim — P”. 
a—>1- n—>oo N 
Note that the conjecture is trivial when P is irreducible, because in 
that case P has but one stationary distribution. 


4. DERIVATIVES 


The reader should by now be convinced that the behaviour of 
PageRank with respect to the damping factor is nonobvious: r (œ) 
should be considered a function of œ, and studied as such. 

The standard tool for understanding changes in a real function 
is the analysis of its derivatives. Correspondingly, we are going to 
provide mathematical support for this analysis. 


4.1 Exact formulae 


The main objective of this section is providing exact formulae 
for the derivatives of r(—). Define r’(a), r” (œ), ..., r ®© (a) as the 
first, second, ..., k-th derivative of r (œ) with respect to a. 

We start by providing the basic relations between these vector 
functions: 


THEOREM 2. The following identities hold: 
1. r'(a) = (r(a)P — v)C(a)7!; 
2. for allk > 0, r+) (æ) = (k + Dr (a) PC (æ). 


PROOF. Multiplying (1) by C(q@) and differentiating member- 
wise: 


r(a)C'(a) +r’(a)C(a) = —v (2) 
r'(a)C(a) = —r(a)C'(a) — v (3) 
r'(a)C(a) =r(a)P — v. (4) 


Since C (œ) is invertible: 
r/(a) = (r(a)P — v)C(a)7!. 
Moreover, differentiating once more (4), we obtain: 
r'(a)C'(a) +r" (a)C(a) = r' (a) P 

r"(a)C(a) =r'(a@)P — r'(a)C'(@) 

r"(a)C(a) =r'(a)P +r'(a)P 
hence 

r! (a) = 2r'(a)PC(a)!, 


which accounts for the base case (k = 1) of an induction for the 
second statement. For the inductive step, again multiplying by 
C(q) and differentiating memberwise: 


PA a)C(a) +r &F) (a)C'(a) = (k + Dr) (a) P 
r&2) aCe) = rEtD a [k+1I)P—C'(@)] 


and the thesis follows easily. 


We can reformulate the statement concerning the first-order deriva- 
tive as follows: 


COROLLARY 1. The following identity holds: 
1 
r'(œ) =r(@) (ow = m=’) . 
l-a 


PROOF. From Theorem 2, we obtain r'(œ) = r(a)PC(a)7! — 


vC. Using (1) we can rewrite this as r(a)PC(a)~! -k'@). 


hence the result. 


Moreover, we can explicitly write a closed formula for the generic 
derivative: 


COROLLARY 2. For everyk > 0 
r” (a) =k! (ra) P — v)C(a)! O)! 
or, equivalently, 
1 Æ 
rw) =k! r(a) (0% = z=!) O(a)k-], 


PROOF. Just proceed from Theorem 2 by iterate substitution, 
and finally apply Corollary 1. 


4.2 Approximating the derivatives 


The formulae obtained in Section 4.1 do not lead directly to an 
effective algorithm that computes derivatives: even assuming that 
the exact value of r (œ) is available, to obtain the derivatives one 
should invert C(a) (see Theorem 2), a heavy (in fact, unfeasible) 
computational task. However, in this section we shall provide a 
way to obtain simultaneous approximations for PageRank and its 


derivatives for a given value of a, and we will show how these 
approximations converge to the desired vectors. 


The simplest and most important algorithm that computes PageR- 
ank [17] is an application of the Power Method; the algorithm 
computes a sequence of vectors vo, vj, ... where v9 = v and 
Vp41 = vg, A(@). This sequence of vectors converges to r (œ), and 
convergence speed depends on œ; more precisely, the difference in 
norm between the k-th iterate and the exact value is O (af). In 
practice, the algorithm provides good approximation quickly: in 
the original paper [17] the authors state that 40 to 50 iterations are 
enough on reasonable data sets; of course, more sophisticated ap- 
proaches have been proposed in the literature to reduce the number 
of iterations and/or the amount of computation needed at each iter- 
ation [17, 7, 6, 15, 11, 10], but they are basically all variants of the 
Power Method. 

The reader can see the first few iterates of the Power Method 
applied to our example in Figure 1. 


We are going to present a modified version of the basic algorithm 
that will compute PageRank and its derivatives up to (any desired) 
index K, and to do this it will use K + 1 vectors. In other words, it 


will build K + 1 vector sequences: the sequence s® (a), s® (a), 


5 5 (a), ... will be used to approximate r(œ) (and will be 
defined exactly as in the classical PageRank algorithm); the se- 


quence sP (a), s(a), ree sa), ... Will be used to approxi- 
mate r’ (a); and so on. Note that the sequence s® (a), s® (a), ..., 
s (a), ... will not, in general, converge to rk) (œ) per se; rather, 


there will be an associated sequence qn (a), q® (A) q® (a), 
... based on it, that will actually converge to the desired derivative. 


sO a) =v 
0 0 
sO (a) = sP (a) A(a) 
sta) = 45 @) 
k+1 k+1 k 
soa V) — ast i aP +q! (a)P 
q (@) = 5, (a) 
1 1 1 © 
g(a) =s 0) -s 0) 
=o 
qP) = ks (a) for all k > 2. 


Figure 2: Basic definitions for the approximation algorithm. 


The vector sequences are defined in Figure 2. Note that only the 
K + 1 vectors s (a) (0 < k < K) need to be stored, whereas 


q ® (a) are only defined for convenience, and can be implemented, 


for example, as a function. 


Our first result is about convergence of the first-order derivative.” 


THEOREM 3. limy-so0 qi? (a) = r'(a), and the difference in 
norm is O(ta’), that is: 


Jai? -r'| = O(ta') ast > œ. 


5We do not assume a particular norm—all our proofs are correct 
with any p-norm. We just note that ||S||; = 1 for any stochastic 


S (as we use row vectors), and so | Pt | p is bounded by a constant 
that depends on P’s size, but not on P’s elements or on t. 
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PROOF. Recall that |a° (a) — r(a)| = O(a‘), since the sec- 


ond eigenvalue of A (œ) is at most a [9]. An easy proof by induction 
shows that for all t 


t-l 
sP @) = vaP) + og, @@Py’P, 
s=0 
sO 


1 
— qP o). 
a 


t—1 
qr (a) = vaP +Y a, OPFP — — 


s=0 
Now, by Corollary 1, we have: 


7 1 
r (a) =r(a@) (ow — —!) ; 


Since 
CO 
O) = PC(@@)"! = PU -aP) | =) (aP) P, 
s=0 
we have 
2 1 
r'(æ) = E E zr (0). 


Hence, we can bound the convergence rate as follows: 


rœ) -P| < | rwar P| + 
s=t 


+ |E (rer—a2_,10)) erro 
s=0 


1 


l-a 


[ræ —4/@)| + eP]. 


We provide an upper bound for each of the summands above. As 
far as the first summand is concerned, recall that X72 „ X! = (I — 


X y-ly ” (whenever the first series converges). Thus, 


| ¥ ria) aPy' P| < |a -aP |- heP] irori, 
s=t 


and the first summand is O(a! ). The second summand can be 
bounded as follows: 


| £ (ria) -4® @)aP) P| = 
s=0 


= 0(2')| > P+ | = O(a) 3 O(1) = O(ta’). 
s=0 s=0 


The third and the fourth summands are both O (a! ). All summands 
are thus O (ta! )—hence the result. 


As far as the other derivatives are concerned, we have: 


THEOREM 4. For every k > 1, lim +00 4% (@) = rP (æ), 
and the difference in norm is O (tat). 


PROOF. First of all, by induction on k and f, one can prove that 


sP a) = k- 1)! 


t—1 
a k-1 
vaP) + gE @(aPy P, 


a-—l 
s=0 


for all k > 1 and all t. The base case t = 0 is trivial (by an easy 
induction on all k > 1), whereas the case k = 2 can be obtained by 
induction on ¢ using Theorem 3, noting that the rule for computing 
qi? (a) is a special case. The inductive step is then obtained using 


the rule that defines s ne (a). 


Now, recalling that (from Theorem 2) 


CO 
rP a) = kr Vaca)! = kY réh aap) P, 
s=0 


we have 


Bo (a) -4P (a) | |k 5 ré—-D (a) (a P) P—ks\” (a) | < 
s=0 


CO 
| rDo e| T kl | v(wP)' | + 


(k—-1) 


+k| 2 (rDo) — gk, 
s=0 


The result follows along the lines of the last part of the proof of 
Theorem 3. 


o) aP) P|. 


We remark two important points that deserve further analysis. 
First of all, the big-oh notation hides a number of constants in- 
dependent of t. However, when k is large or a very close to 1 
these constants may become important. Second, we did not give a 
detailed evaluation of the numerical precision that is necessary to 
perform these computations. 


4.3 Implementation of the algorithm 


The results of the previous section can be used to modify the 
classical PageRank algorithm, based on the Power Method, so to 
compute an approximation of the derivatives of PageRank up to a 
certain index. 

The algorithm uses a vector s[—, —] where the first index rep- 
resents the derivative index (from 0 to K, inclusive, where K is 
the highest derivative order to be computed) and the second index 
represents the node. In other words, at step t the vector s[k, —] 


represents s® (a). The vector q” (a) is not itself represented as a 
vector, but rather it is implemented by the procedure q(). 

The procedure init() initialises the vector s[—, —], whereas step() 
computes the vector for the next iteration (the new vector is indi- 
cated by s’[—, —]). 

The stopping criterion can be decided in many ways: for exam- 
ple, at each step, the norms of the differences between each deriva- 
tive and the derivative at the previous step are computed, and the 
iteration is stopped if all such norms are below a certain threshold. 


procedure q(k, i) 
if k=0 then return s[0, i]; 
else if k=1 then return s[1, i] — s[0, i]/(1 — a); 
else return k - s[k, i]; 


procedure init() 
for i:=0, 1,..., N — 1 do 
s[0, i]:=v[i]; 
for k:=1,..., K do 
s[k, i]:=q(k — 1, i); 
end for 
end for 
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procedure step() 
for i:=0, 1,..., N — 1 do 
for k:=0, 1,..., K do 
s'[k, i] := 0; 
end for 
end for 
for i:=0, 1,..., N — 1 do 
d:=outdegree of node i; 
for all successors j of i do 
s/[0, j]:=8"[0, j] + a + q(0, i)/d; 
for k:=1,2,..., K do 
s'[k, j]l:=s'[k, j] + (as[k, i] + q(k — 1, i))/d; 
end for 
end for 
end for 


procedure computePageRankAndDerivatives() 
init(); 
do step(); while not stopping condition; 


5. MACLAURIN SERIES 


The rational function r (—) can be expressed using its Maclaurin 
series (e.g., the Taylor series about 0); let us denote by t, (a) the 
n-th degree Maclaurin polynomial of r(—) evaluated in a. 

Clearly, Maclaurin polynomials offer an appealing way to study 
PageRank in relation to a. To obtain an explicit formula for tn (œ), 
just recall from Corollary 2 that r™ (0) = k! r(0)(Q(0)—N Q(0)*-!. 
Since r (0) = v and Q(O) = P, we have, for all k > 0, 


rO O =k! (Pp — DPE. 


Now, since ty (w) = X? _o(1/k)atr ®© (0), we have 


ta (œ) = of 1 $ yo = pen) 


k=1 


Two important problems face us now: first of all, how to compute 
tn (œ); second, how to choose n. Both problems will be solved by 
a surprisingly simple relationship between Maclaurin polynomials 
and the Power Method that will be proved in this section. To obtain 
our main result, we will need the following: 


LEMMA 1. Let @ be a set of square matrices of the same size, 
and R € © such that for every M € © we have MR = R. Then 
for all M € ©, à € Rand for all n we have 


n—-1 
(AM + (1—A)R)" =A" M" + (1-2) So RM, 
k=0 
or, equivalently, 
n—1 


AM+0—A)R)” = A” M"+R(I—-A"M"—4R > ak(m*—mM*1), 


k=1 


PROOF. By an easy induction. The first statement is trivial for 
n = 0. If we multiply both members by AM + (1 — A)R on the 


right we have 


QM +- XR)! = 


n—1 
antl ynt! de Cas Oe aa aa +1 (1 —A)R+ 
k=0 
n—l1 
+a -4P J AFR = 
k=0 
n—1 
= pat ygntl se 55 aes eae +A” (1 —a)R+ 
k=0 
sien = pe 
1-2 
n 
=+ M+ +a) oak RME. 


k=0 


The second statement can be then proved by expanding the summa- 
tion and collecting monomials according to the powers of À. 


Of course, the last result can be easily restated in any R-algebra. 


We can now come to the main result of this section, which equates 
analytic approximation (the degree of the Maclaurin polynomial) 
with computational approximation (the number of iterations of the 
Power Method): 


THEOREM 5. The n-th approximation of PageRank computed 
by the Power Method with damping factor a coincides with the n- 
th degree Maclaurin polynomial of PageRank evaluated in a. In 
other words, vA(a)" = tn (œ). 


PROOF. Apply Lemma 1 to the case when M = P, R = 17v 
and à = a. We have: 


n—1 


Atay” =a" P” +17 ya" vP”! +17v 5 ak( pk — pe, 


k=l 
hence 
(0) 
qh (@) = vA(@)" = 
n—l1 
=a"vP"+v—a"vP"!4 9 oe — Pe) = 
k=1 


n—1 
=v+v J_at (P* — pK!) = tn (0). 
k=1 


Asa consequence: 


COROLLARY 3. The difference between the k-th and the (k — 


1)-th approximation of PageRank (as computed by the Power Method), 


divided by ak, is the k-th coefficient® of the Maclaurin series of 
PageRank. 


The previous corollary is apparently innocuous. However, as a 
consequence the data obtained computing PageRank for a given a 


©The coefficients are vectors, because we are approximating a vec- 
tor function. 
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can be used to compute immediately PageRank for any other a, 
obtaining the result of the Power Method after the same number of 
iterations. Indeed, by saving the Maclaurin coefficients during the 
computation of PageRank with a specific a it is possible to study 
the behaviour of PageRank when «œ varies. Even more is true, of 
course: using standard series derivation techniques, one can ap- 
proximate the k-th derivative (lowering of course by k the approxi- 
mating polynomial). Note, however, that the algorithm presented in 
Section 4.3 provides values of the derivatives for a specific a with 
a precision guarantee. 

The first few coefficients of the Maclaurin polynomial for our 
example are shown in Figure 2. 


6. EXPERIMENTAL RESULTS 


Figure 3 illustrates from an experimental viewpoint the conver- 
gence speed theorems of Section 4.2. We computed PageRank and 
its derivatives up to index four (inclusive) and we plotted the differ- 
ence, in Ly-norm (we used for the computation a small, 325 557- 
nodes graph of the sites of the Italian CNR), between two succes- 
sive iterates during the first 70 iterations; for every derivative we 
also show the upper bounds proved in Theorems 3 and 4. Note that 
there is a transient irregular behaviour due to the constants hidden 
in the proofs. 

Figure 4 shows the convergence of Maclaurin polynomials to- 
ward the actual PageRank behaviour for a chosen node. Finally, 
in Figure 9 we display the approximation obtained with a 100- 
degree Maclaurin polynomial. We choose four nodes with differ- 
ent behaviours (monotonic increasing/decreasing, unimodal con- 
cave/convex) to show that the approximation is excellent in all these 
cases. For this experiment we used a 41 291 594-nodes snapshot of 
the Italian web gathered by UbiCrawler [1] and indexed by Web- 
Graph [2]. 


Figure 3: The convergence speed in the computation of deriva- 
tives up to order 4 (the label is the order of the derivative). 


7. CONCLUSIONS 


We have presented a number of results which outline the first an- 
alytic study of PageRank when the damping factor changes. While 
our results are mainly theoretical in nature, they provide efficient 
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Figure 4: Approximating r(œ) for a specific node (cross- 
shaped points) using Maclaurin polynomials of different de- 
grees (shown in the legend). 


ways to study the global behaviour of PageRank, and dispel a few 
myths (in particular, about the significance of PageRank when a 
gets close to 1). 

A last point that is worth being noted is that our algorithm to 
obtain the Maclaurin polynomials for PageRank may be used to 
determine new forms of ranking; for example, one may define the 
total rank of a page x as fi 5 ry(a) da. This quantity (the area under 
the PageRank curve of node x) is independent from a, and induces 
interesting rankings that will be studied in a forthcoming paper. 
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Appendix: An example 


To clarify the discussion of the previous section, we provide a full 
example in Figure 5. Node 3 is the only terminal node of the graph, 
but nodes 4 and 5 belong to a looped terminal component (see Fig- 
ure 6). Correspondingly, Figure 8 shows that PageRank for nodes 
4 and 5 grows, whereas for all other nodes goes to 0 asa —> 17. 
Note, however, the maximum attained by node 0 at œ ~ 0.7. 


Step Approximation 

(0.100, 0.100, 0.100, 0.100, 0.100, 0.100, 0.100, 0.100, 0.100, 0.100) 
(0.415, 0.049, 0.075, 0.075, 0.113, 0.070, 0.049, 0.049, 0.049, 0.049) 
(0.232, 0.100, 0.051, 0.062, 0.078, 0.072, 0.100, 0.100, 0.100, 0.100) 
( ) 
( ) 


0.391, 0.066, 0.070, 0.049, 0.097, 0.056, 0.066, 0.066, 0.066, 0.066 
0.283, 0.093, 0.054, 0.056, 0.076, 0.063, 0.093, 0.093, 0.093, 0.093 


PUNO 


Table 1: The approximations computed in the first iterations of the Power Method (with a = 0.85). 
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Figure 7: The explicit formula for PageRank as a function of œ. Its limit is lim,_,;- r (œ) = r* = (0, 0, 0, 0, 1/2, 1/2, 0, 0, 0, 0). 


Q 
p 
OOOO 


Figure 5: A graph with n = 10 nodes and its modified version. 


Figure 8: The behaviour of the components of r (œ). They all go 
to zero except for nodes 4 and 5—the only nodes belonging to a 
terminal component. Note, however, the maximum attained by 


Figure 6: The components of the graph in Figure 5 and the 
node 0 at a ~ 0.7. 


corresponding component graph. The dashed line in the com- 
ponent graph gathers components that are merged in G. In 
both G and G the only terminal component is { 4, 5 }. 
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Coeffi cient 


a (0.100, 0.100, 0.100, 0.100, 0.100, 0.100, 0.100, 0.100, 0.100, 0.100) 

a! (0.371, —0.058, —0.028, —0.028, 0.015, —0.034, —0.058, —0.058, —0.058, —0.058) 
a2 (—0.253, 0.070, —0.033, —0.018, —0.048, 0.003, 0.070, 0.070, 0.070, 0.070) 

a> — (0.260, —0.055, 0.030, —0.021, 0.032, —0.026, —0.055, —0.055, —0.055, —0.055) 
œt —(—0.207, 0.050, —0.029, 0.013, —0.040, 0.012, 0.050, 0.050, 0.050, 0.050) 


Table 2: The coefficients of the Maclaurin series. 
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Figure 9: Examples of approximations obtained using a Maclaurin polynomial of degree 100, for nodes with different behaviours 
(the points were tabulated by computing PageRank explicitly with 100 regularly spaced values of œ). 
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